Enhance parallel foreign scan support by MisterRaindrop · Pull Request #1571 · apache/cloudberry

MisterRaindrop · 2026-02-10T08:46:21Z

support parallel foreign scan support and add a new mock fdw for testing parallel foreign scans.

Key changes include:

Implementation of parallel_foreign_scan_test_fdw to generate synthetic rows and support parallel scanning.
Modifications to the optimizer to generate gather paths for foreign tables with parallel capabilities.
Updates to execMain.c to enable parallel mode for gather nodes based on the execution context.
Addition of test cases to validate the functionality of the new FDW in both coordinator and all-segments modes.

What does this PR do?

Type of Change

Bug fix (non-breaking change)
New feature (non-breaking change)
Breaking change (fix or feature with breaking changes)
Documentation update

Breaking Changes

Test Plan

Unit tests added/updated
Integration tests added/updated
Passed make installcheck
Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Followed contribution guide
Added/updated documentation
Reviewed code for security implications
Requested review from cloudberry committers

Additional Context

CI Skip Instructions

add a new mock foreign data wrapper (FDW) for testing parallel foreign scans. Key changes include: - Implementation of `parallel_foreign_scan_test_fdw` to generate synthetic rows and support parallel scanning. - Modifications to the optimizer to generate gather paths for foreign tables with parallel capabilities. - Updates to `execMain.c` to enable parallel mode for gather nodes based on the execution context. - Addition of test cases to validate the functionality of the new FDW in both coordinator and all-segments modes.

avamingli

Overall, FDW parallel scan is a direction worth exploring, but this approach is too rough. The core problems are:

locus transition semantics for Gather in an MPP context haven't been thought through, and the changes are too broad.
FDW is a black box from the database's perspective.
For heap tables we have parallel scan (divide work by pages), for AO/AOCS we have parallel scan (divide work by files) — the work partitioning is well-defined.
But for FDWs, the parallel behavior depends entirely on the FDW's own implementation. If an FDW (say file_fdw) sets parallel_safe = true following planner's parallel logic but doesn't actually implement the DSM parallel callbacks (EstimateDSMForeignScan, InitializeDSMForeignScan, InitializeWorkerForeignScan), then multiple workers will each scan the full dataset, producing duplicate rows.

avamingli · 2026-02-11T13:22:09Z

src/backend/optimizer/util/pathnode.c

+	/* Inherit locus from subpath — Gather collects within the same segment,
+	 * data distribution across segments doesn't change. */
+	pathnode->path.locus = subpath->locus;
+	pathnode->path.locus.parallel_workers = 0;	/* Gather output is single-stream */
+
+	pathnode->path.motionHazard = subpath->motionHazard;
+	pathnode->path.barrierHazard = subpath->barrierHazard;
+	pathnode->path.rescannable = false;
+	pathnode->path.sameslice_relids = subpath->sameslice_relids;
+


create_gather_path was guarded by Assert(false) for a reason — the locus semantics of Gather in an MPP context were never fully defined or implemented.

The comment is wrong, and the conclusion is wrong.
Gather does change the data distribution.
Simplest example: a Hash-distributed table's parallel partial scan has locus HashWorkers (data is hash-distributed across segments, and within each segment it's split across workers).
Gather collects all workers' data back to the leader process, so the locus should become Hash. You can't just copy the subpath locus and zero out parallel_workers — that's incorrect for HashWorkers , SegmentGeneral, and might be other locus types as well.
Also, this change is global — create_gather_path isn't FDW-specific. Once the Assert is gone, every code path that calls this function is affected. The hardest case is JOINs — mixing Gather with CBDB-style parallelism (Motion/slice) introduces a ton of problems. This PR doesn't seem to have considered any of that.

MisterRaindrop · 2026-02-13T13:05:48Z

Overall, FDW parallel scan is a direction worth exploring, but this approach is too rough. The core problems are:

locus transition semantics for Gather in an MPP context haven't been thought through, and the changes are too broad.

FDW is a black box from the database's perspective.
For heap tables we have parallel scan (divide work by pages), for AO/AOCS we have parallel scan (divide work by files) — the work partitioning is well-defined.
But for FDWs, the parallel behavior depends entirely on the FDW's own implementation. If an FDW (say file_fdw) sets parallel_safe = true following planner's parallel logic but doesn't actually implement the DSM parallel callbacks (EstimateDSMForeignScan, InitializeDSMForeignScan, InitializeWorkerForeignScan), then multiple workers will each scan the full dataset, producing duplicate rows.

I'm not very familiar with Cloudberry. Still learning.

FDW itself is a black box. Its specific implementation largely depends on how the user implements it. My understanding is that users need to take responsibility for their own implementations. Additionally, I should only enable gather for FDW. In other cases, it should remain false, this will parallel processing advantages of PostgreSQL?

Additionally, I've looked into other aspects of FDW parallelism. Currently, it seems there is no optimal solution.

So, should we aim to implement parallelism that is transparent to users? Or are there better approaches? Could you share some idea?

avamingli · 2026-02-24T04:33:57Z

Overall, FDW parallel scan is a direction worth exploring, but this approach is too rough. The core problems are:

locus transition semantics for Gather in an MPP context haven't been thought through, and the changes are too broad.

FDW is a black box from the database's perspective.
For heap tables we have parallel scan (divide work by pages), for AO/AOCS we have parallel scan (divide work by files) — the work partitioning is well-defined.
But for FDWs, the parallel behavior depends entirely on the FDW's own implementation. If an FDW (say file_fdw) sets parallel_safe = true following planner's parallel logic but doesn't actually implement the DSM parallel callbacks (EstimateDSMForeignScan, InitializeDSMForeignScan, InitializeWorkerForeignScan), then multiple workers will each scan the full dataset, producing duplicate rows.

I'm not very familiar with Cloudberry. Still learning.

FDW itself is a black box. Its specific implementation largely depends on how the user implements it. My understanding is that users need to take responsibility for their own implementations. Additionally, I should only enable gather for FDW. In other cases, it should remain false, this will parallel processing advantages of PostgreSQL?

Additionally, I've looked into other aspects of FDW parallelism. Currently, it seems there is no optimal solution.

So, should we aim to implement parallelism that is transparent to users? Or are there better approaches? Could you share some idea?

Neither PostgreSQL nor Cloudberry supports parallel FDW scans, that's a deliberate decision, not an oversight.

On the implementation side: having the kernel generate partial paths for FDW will cause FDWs that don't implement parallel scan callbacks to silently produce wrong results (e.g. duplicate rows). That's a kernel bug, not a user error — we can't shift that responsibility to FDW authors. And mixing Gather with CBDB-style parallelism remains fundamentally broken — the locus handling is wrong, and none of the issues I raised (joins, locus transitions, the overly broad execMain.c change) have been addressed.

More importantly, before discussing how, we need to answer why. What real-world problem does this solve in an MPP system where FDW is already used across segments? And given the risks I mentioned above — broken locus transitions, silent wrong results for existing FDWs, untested join/subquery interactions — even if it can be done, is it worth the complexity? If you want to push this forward, you need to make the case clearly: what's the motivation, and convince us that all the issues raised have sound solutions.

MisterRaindrop · 2026-02-24T07:17:22Z

Overall, FDW parallel scan is a direction worth exploring, but this approach is too rough. The core problems are:

locus transition semantics for Gather in an MPP context haven't been thought through, and the changes are too broad.

FDW is a black box from the database's perspective.
For heap tables we have parallel scan (divide work by pages), for AO/AOCS we have parallel scan (divide work by files) — the work partitioning is well-defined.
But for FDWs, the parallel behavior depends entirely on the FDW's own implementation. If an FDW (say file_fdw) sets parallel_safe = true following planner's parallel logic but doesn't actually implement the DSM parallel callbacks (EstimateDSMForeignScan, InitializeDSMForeignScan, InitializeWorkerForeignScan), then multiple workers will each scan the full dataset, producing duplicate rows.

I'm not very familiar with Cloudberry. Still learning.
FDW itself is a black box. Its specific implementation largely depends on how the user implements it. My understanding is that users need to take responsibility for their own implementations. Additionally, I should only enable gather for FDW. In other cases, it should remain false, this will parallel processing advantages of PostgreSQL?
Additionally, I've looked into other aspects of FDW parallelism. Currently, it seems there is no optimal solution.
So, should we aim to implement parallelism that is transparent to users? Or are there better approaches? Could you share some idea?

Neither PostgreSQL nor Cloudberry supports parallel FDW scans, that's a deliberate decision, not an oversight.

On the implementation side: having the kernel generate partial paths for FDW will cause FDWs that don't implement parallel scan callbacks to silently produce wrong results (e.g. duplicate rows). That's a kernel bug, not a user error — we can't shift that responsibility to FDW authors. And mixing Gather with CBDB-style parallelism remains fundamentally broken — the locus handling is wrong, and none of the issues I raised (joins, locus transitions, the overly broad execMain.c change) have been addressed.

More importantly, before discussing how, we need to answer why. What real-world problem does this solve in an MPP system where FDW is already used across segments? And given the risks I mentioned above — broken locus transitions, silent wrong results for existing FDWs, untested join/subquery interactions — even if it can be done, is it worth the complexity? If you want to push this forward, you need to make the case clearly: what's the motivation, and convince us that all the issues raised have sound solutions.

Parallel FDW primarily addresses the issue of slow data loading. This functionality was already implemented in earlier versions of PostgreSQL. Now, I am attempting to integrate this feature into MPP systems. In simple tests, parallelization has indeed delivered a performance improvement of one to two times. Such gains are essential for performance-sensitive business scenarios. Therefore, I am working to introduce this functionality. Alternatively, we could discuss the implementation plan in the issue tracker.

MisterRaindrop · 2026-02-24T07:57:02Z

Thank you for the detailed review comments. Regarding the core issues raised (the security impact of partial paths on FDWs that do not support parallel callbacks, the mixing of Gather and CBDB gang models, locus conversion, and the scope of changes in execMain.c), I agree that these are all issues that need to be addressed seriously.

After reconsideration, I am inclined to withdraw the kernel-side modifications and adopt a pure FDW-layer solution instead. The core idea is:

Do not modify the kernel's partial path generation, execMain.c, or locus logic—avoiding all the risks mentioned above.
The FDW directly uses CBDB's existing parallel variables (ParallelWorkerNumberOfSlice / TotalParallelWorkerNumberOfSlice) to obtain the current worker number and total count.
During execution, the FDW calculates the virtual segment ID based on these two values, modifies the HTTP header sent to PXF, and allows PXF's round-robin sharding mechanism to automatically distribute data evenly among all gang workers.

This solution requires no kernel modifications and will not affect other FDWs.

I would like to confirm: Is this direction reasonable? Are the variables ParallelWorkerNumberOfSlice and TotalParallelWorkerNumberOfSlice stable and reliable under the current CBDB parallel framework? Or do you have a more recommended way for the FDW to perceive gang parallel information?

MisterRaindrop · 2026-02-24T08:00:19Z

Among the existing parallel frameworks in CBDB, after FDW registers a partial path via add_partial_path(), can the planner correctly trigger gang expansion and set ParallelWorkerNumberOfSlice? Or does it require additional kernel adaptation?

avamingli · 2026-02-24T08:15:07Z

Parallel FDW primarily addresses the issue of slow data loading. This functionality was already implemented in earlier versions of PostgreSQL.

Where exactly? Are you referring to this commit?

avamingli · 2026-02-24T08:19:19Z

This solution requires no kernel modifications and will not affect other FDWs.

I would like to confirm: Is this direction reasonable?

That sounds more reasonable.

avamingli · 2026-02-24T08:31:45Z

Are the variables ParallelWorkerNumberOfSlice and TotalParallelWorkerNumberOfSlice stable and reliable under the current CBDB parallel framework?

Yes, they are stable and reliable under the current CBDB parallel framework. But I'm not sure how you plan to use them.

During execution, the FDW calculates the virtual segment ID based on these two values, modifies the HTTP header sent to PXF, and allows PXF's round-robin sharding mechanism to automatically distribute data evenly among all gang workers.

I'm not entirely sure I follow — isn't this essentially how MPP PXF works today? except the virtual segment ID based on these two values -- not sure, off the hand I think it's not enough, different Slice on same Segment could have same parallel workers.

MisterRaindrop · 2026-02-24T08:46:26Z

Are the variables ParallelWorkerNumberOfSlice and TotalParallelWorkerNumberOfSlice stable and reliable under the current CBDB parallel framework?

Yes, they are stable and reliable under the current CBDB parallel framework. But I'm not sure how you plan to use them.

During execution, the FDW calculates the virtual segment ID based on these two values, modifies the HTTP header sent to PXF, and allows PXF's round-robin sharding mechanism to automatically distribute data evenly among all gang workers.

I'm not entirely sure I follow — isn't this essentially how MPP PXF works today? except the virtual segment ID based on these two values -- not sure, off the hand I think it's not enough, different Slice on same Segment could have same parallel workers.

Yes, essentially, it reuses the existing MPP round-robin sharding mechanism of PXF—by modifying the segment ID/count in the HTTP header, PXF can distribute data to N×W gang workers instead of N physical segments. No changes are required on the PXF server side.

Regarding ParallelWorkerNumberOfSlice: From the assignment logic in parallel.c, workers on the same segment are assigned incrementally via DSM entry (0, 1, 2, ...), which should be unique. However, I want to confirm: In the CBDB parallel framework, is this value guaranteed to be unique within the same slice on the same segment?

MisterRaindrop mentioned this pull request Feb 10, 2026

feat: pxf fdw support parallel scan apache/cloudberry-pxf#61

Open

yjhjstz requested a review from avamingli February 10, 2026 19:48

avamingli reviewed Feb 11, 2026

View reviewed changes

MisterRaindrop closed this Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance parallel foreign scan support#1571

Enhance parallel foreign scan support#1571
MisterRaindrop wants to merge 1 commit intoapache:mainfrom
MisterRaindrop:fdw_parallel_support

MisterRaindrop commented Feb 10, 2026 •

edited

Loading

Uh oh!

avamingli left a comment

Uh oh!

avamingli Feb 11, 2026 •

edited

Loading

Uh oh!

MisterRaindrop commented Feb 13, 2026

Uh oh!

avamingli commented Feb 24, 2026

Uh oh!

MisterRaindrop commented Feb 24, 2026

Uh oh!

MisterRaindrop commented Feb 24, 2026

Uh oh!

MisterRaindrop commented Feb 24, 2026

Uh oh!

avamingli commented Feb 24, 2026

Uh oh!

avamingli commented Feb 24, 2026

Uh oh!

avamingli commented Feb 24, 2026

Uh oh!

MisterRaindrop commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MisterRaindrop commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Checklist

Additional Context

CI Skip Instructions

Uh oh!

avamingli left a comment

Choose a reason for hiding this comment

Uh oh!

avamingli Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MisterRaindrop commented Feb 13, 2026

Uh oh!

avamingli commented Feb 24, 2026

Uh oh!

MisterRaindrop commented Feb 24, 2026

Uh oh!

MisterRaindrop commented Feb 24, 2026

Uh oh!

MisterRaindrop commented Feb 24, 2026

Uh oh!

avamingli commented Feb 24, 2026

Uh oh!

avamingli commented Feb 24, 2026

Uh oh!

avamingli commented Feb 24, 2026

Uh oh!

MisterRaindrop commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MisterRaindrop commented Feb 10, 2026 •

edited

Loading

avamingli Feb 11, 2026 •

edited

Loading